Skip to content

Conversation

simon-mo
Copy link
Collaborator

@simon-mo simon-mo commented Oct 6, 2025

Summary

  • clarify the metrics design doc so the prometheus middleware note no longer references the legacy V0 engine migration
  • update the speculative decoding guide to state that draft-model support requires the V1 engine instead of pointing to the retired v0.10 release

Testing

  • not run (documentation changes only)

https://chatgpt.com/codex/tasks/task_e_68e3f11c47408329bf2324ac7b1ad7bf

@mergify mergify bot added the documentation Improvements or additions to documentation label Oct 6, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request provides a number of documentation updates to remove references to the legacy v0 engine and clarify concepts for the current v1 engine. The changes are well-executed across multiple files, improving the clarity and relevance of the documentation for users. The updates are consistent with the stated goals of the PR, and I have no further suggestions.


We have started the process of deprecating V0. Please read [RFC #18571](gh-issue:18571) for more details.

V1 is now enabled by default for all supported use cases, and we will gradually enable it for every use case we plan to support. Please share any feedback on [GitHub](https://github.com/vllm-project/vllm) or in the [vLLM Slack](https://inviter.co/vllm-slack).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also update this paragraph?

| **Mamba Models** | <nobr>🟢 (Mamba-2), 🟢 (Mamba-1)</nobr> |
| **Multimodal Models** | <nobr>🟢 Functional</nobr> |

vLLM V1 currently excludes model architectures with the `SupportsV0Only` protocol.
Copy link
Member

@DarkLight1337 DarkLight1337 Oct 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should remove the V1 column from the Supported Models page and delete all models that don't support V1

Chunked prefill allows vLLM to process large prefills in smaller chunks and batch them together with decode requests. This feature helps improve both throughput and latency by better balancing compute-bound (prefill) and memory-bound (decode) operations.

In vLLM V1, **chunked prefill is always enabled by default**. This is different from vLLM V0, where it was conditionally enabled based on model characteristics.
In vLLM V1, **chunked prefill is always enabled by default** so that behavior is consistent across supported models.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In vLLM V1, **chunked prefill is always enabled by default** so that behavior is consistent across supported models.
In vLLM V1, **chunked prefill is always enabled by default**.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are probably some mistakes here. @markmc PTAL

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@njhill I guess this page can use a full clean up

Comment on lines +19 to +20
Speculative decoding with a draft model requires the V1 engine.
Older releases that predate V1 (such as the 0.10.x series) raise a `NotImplementedError`.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Speculative decoding with a draft model requires the V1 engine.
Older releases that predate V1 (such as the 0.10.x series) raise a `NotImplementedError`.
Speculative decoding with a draft model is not supported in vLLM V1 version.
You can use older version before the 0.10x series to continue to leverage it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should remove the V1 column from the Supported Models page and delete all models that don't support V1

LGTM after doing this

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably gradually remove this docs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
codex documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants